Morpho-biochemical characterization of a RIL population for seed parameters and identification of candidate genes regulating seed size trait in lentil (Lens culinaris Medik.)

The seed size and shape in lentil (Lens culinaris Medik.) are important quality traits as these influences the milled grain yield, cooking time, and market class of the grains. Linkage analysis was done for seed size in a RIL (F5:6) population derived by crossing L830 (20.9 g/1000 seeds) with L4602 (42.13 g/1000 seeds) which consisted of 188 lines (15.0 to 40.5 g/1000 seeds). Parental polymorphism survey using 394 SSRs identified 31 polymorphic primers, which were used for the bulked segregant analysis (BSA). Marker PBALC449 differentiated the parents and small seed size bulk only, whereas large seeded bulk or the individual plants constituting the large-seeded bulk could not be differentiated. Single plant analysis identified only six recombinant and 13 heterozygotes, of 93 small-seeded RILs (<24.0 g/1000 seed). This clearly showed that the small seed size trait is very strongly regulated by the locus near PBLAC449; whereas, large seed size trait seems governed by more than one locus. The PCR amplified products from the PBLAC449 marker (149bp from L4602 and 131bp from L830) were cloned, sequenced and BLAST searched using the lentil reference genome and was found amplified from chromosome 03. Afterward, the nearby region on chromosome 3 was searched, and a few candidate genes like ubiquitin carboxyl-terminal hydrolase, E3 ubiquitin ligase, TIFY-like protein, and hexosyltransferase having a role in seed size determination were identified. Validation study in another RIL mapping population which is differing for seed size, showed a number of SNPs and InDels among these genes when studied using whole genome resequencing (WGRS) approach. Biochemical parameters like cellulose, lignin, and xylose content showed no significant differences between parents and the extreme RILs, at maturity. Various seed morphological traits like area, length, width, compactness, volume, perimeter, etc., when measured using VideometerLab 4.0 showed significant differences for the parents and RILs. The results have ultimately helped in better understanding the region regulating the seed size trait in genomically less explored crops like lentils.


Introduction
Lentil (Lens culinaris ssp. culinaris Medik.) is a diploid (2n=14), self-pollinated, cool season legume crop having a genome size of nearly 4.2 Gb (Arumuganathan and Earle, 1991;Dikshit et al., 2022a;Dikshit et al., 2022b). This is not only rich in proteins but also in micronutrients (Fe and Zn) and b-carotene (Mishra et al., 2020;Priti et al., 2021;Priti et al., 2022). Lentil is being grown throughout the world in temperate to sub-tropical regions including regions of the Middle East, north-eastern Africa, Southern Europe, South and North America, Australia, and the Indian sub-continent . Globally, Canada is the largest producer and exporter of lentils. Lentil is an important crop for India having acreage of 1.35 m ha and production of 1.18 m tons. The world production of lentils is 6.54 m tons from an area of nearly 5.01 m ha. Lentil productivity in India (871.5 kg/ha) is well below world productivity (1304.9 kg/ha) (FAOSTAT, 2020).
Seed quality of lentil is an important objective for both industry and the consumer. Among various parameters, seed size is the key parameter defining the overall lentil quality . During domestication of lentils, several traits like pod dehiscence, dormancy, and seed size got modified which ultimately allowed easy collection of seeds by the farmers for next year sowing (Sonnante et al., 2009). Most of the domestication traits like pod dehiscence, dormancy, and growth habit are single gene governed traits while seed size is a quantitative trait. Depending upon the seed size lentil is classified into microsperma type (2 to 6 mm diameter, red and yellow cotyledons, and pigmented flowers) and macrosperma type (6 to 9 mm diameter, yellow cotyledon, and non-pigmented flowers) (Barulina, 1930;Sandhu and Singh, 2007). Generally, microsperma types are more common in southeast Asia, while macrosperma types in western Asia and Europe (Barulina, 1930). Previous genetic studies revealed large variations for seed weight and seed diameter in lentils (Tullu et al., 2001;Dutta et al., 2022;Tripathi et al., 2022). Seed size and shape are known to influence both cooking time and dehulling efficiency and are considered important market-associated trait Wang, 2008). A strong positive correlation (r=0.96) was recorded between seed size and cooking time (Hamdi et al., 1991). Ford et al. (2007) noted reduced damage during handling in the rounder seed-shaped lentil cultivars over thin, sharp-edged types. Thus, the development of genotypes with improved seed parameters including seed weight is an important breeding objective of lentil breeders across the globe . Generally, seed parameters are measured using crude phenotypic evaluation methods like measurement of 100-grain weight or seed diameter measurement using Vernier caliper or graded sieve (Hossain et al., 2010;Xu et al., 2011). In lentils, seed diameter was also measured using computer-assisted 2-dimensional imaging (Shahin and Symons, 2001) and seed plumpness was determined using 3dimensional imaging using a camera (Shahin et al., 2006). However, these are laborious methods, especially when a large number of genotypes are involved in screening. Recently, Dutta et al. (2022) used a very easy method involving VideometerLab 4.0 instrument for the measurement of various seed parameters in lentils.
Linked molecular markers with the trait of interest will help in efficient breeding for that trait (Mishra et al., 2003). Seed weight is known to be governed by several genes and thus identification of linked markers with the seed weight QTLs will help in the better selection for this trait. This will also help in speeding up of new variety development having desired seed parameters (Tripathi et al., 2022). Also, for the implementation of molecular breeding approach for seed size trait, there is a need for the development of an experimental population involving contrasting parents, so that the linkage can be established between marker and the trait. Evaluation of a RIL population with SSRs markers using BSA approach can help in the identification of linked markers with the seed size trait in lentil (Mishra et al., 2001;Mishra et al., 2003). This study hypothesizes that the genomic region controlling the seed size can be identified using molecular markers in a mapping population differing for the seed size trait. Against this backdrop, the objective of this study was to perform the morpho-biochemical characterization of a RIL population for seed parameters and identification of candidate genes regulating seed size trait in lentil.

Plant materials
Two lentil genotypes differing significantly in seed size, L830 (small-seeded; mean 1000 seed weight = 20.9 g) and L4602 (largeseeded; mean 1000 seed weight = 42.13g) were used as the parent for the development of a RIL population (Figure 1). Cross was attempted between the L4602 and L830 and the F 1 was confirmed for its hybridity using polymorphic SSR markers. The parents and the RIL population (F 5:6 ) having 188 lines were grown during rabi-2021 at the fields of Indian Agricultural Research Institute, New Delhi, India (Latitude: 28.6412°N, Longitude: 77.1627°E, and Altitude: 228.61 m AMSL) with the spacing of 30×5 cm (row to row × plant to plant) in a 5.0 m row length using standard cultivation practices. Each row was harvested at maturity and 1000 seed weight was measured for parents and the RILs (Figures 2, S1).

DNA extraction and constitution of bulks for bulked segregant analysis
Nearly 15-20 seeds each from 188 RILs along with the parents (L830 and L4602) were kept on the germination paper and was wrapped in a butter paper. This was then kept in a germination chamber for 8-10 days at 20-25°C. The tender seedlings were used for DNA isolation using CTAB method (Murray and Thompson, 1980) and quality was checked on 0.8% Agarose gel, while quantity was measured using Nanodrop (Garcıá-Alegrıá et al., 2020). A total of 10 extreme genotypes each from small-seeded RILs (line No. 05,14,16,64,88,111,117,155,160,169) and large-seeded RILs (line No. 03,86,87,97,102,107,108,115,133,190) were used for the BSA (Figure 3). An equal quantity of DNA (20 ng/µL) was taken from each of the 10 extreme RILs and mixed to constitute the two contrasting bulks (B1 and B2). A total of 394 SSRs were used for the parental polymorphism survey (Table S1) and polymorphic primers were used for BSA (Michelmore et al., 1991) and band were separated on 3.0% Metaphor agarose gel and scored. The SSRs differentiating the bulks and the parents were used for the individual RIL analysis. The RILs were arranged in the increasing order of their seed size, PCR was performed and amplification was visualized on the gel using gel documentation system.

Cloning and sequencing of a PCR amplified product
The DNA fragment amplified by an SSR marker (PBALC449) in L4602 and L830 was used for the cloning and sequencing. The amplified bands containing the DNA were first precisely excised from the gel with a clean, sharp scalpel and then DNA was extracted using QIAquick Gel Extraction Kit (QIAGEN, Valencia, USA) by following the manufacturer's instructions (Sambrook et al., 1989). The amplified product was ligated with pJET1.2/blunt vector using CloneJET PCR Cloning Kit (Thermo Fisher Scientific ™ ) as per the mentioned protocol (https://www.thermofisher.com/document-connect/document-connect. html). Then the recombinant vector was transformed into E. coli DH5a strain competent cells for cloning using the standard protocol. Afterward, plasmid was isolated using FavorPrep Plasmid Extraction Mini Kit as per the manufacturer's instructions and extracted plasmid DNA was stored at -20°C for further analysis. The cloning was confirmed by restriction digestion using Bgl II. The positive clones were sequenced using Sanger sequencing platform using universal primer. The raw sequence data was processed by trimming the vector sequence and aligned to the reference genome (CDC Redberry Genome Assembly v2.0; Ramsay et al., 2021) using NCBI BLAST browser.
Biochemical analysis of lentil genotypes and the extreme RILs differing for seed size Various cell wall related biochemical analyses were performed on the 10 extreme RILs each for seed size (large and small seeded RILs), and the parents. Seeds of the small-seeded (L830, mean 1000 seed weight=20.9g) and large-seeded (L4602, mean 1000 seed weight=42.13g) parents were used for the seed size analysis.

Preparation of alcohol insoluble residue sample
Briefly, 600 mg of lentil seeds were crushed, flash frozen (in liquid N 2 ), and ground in Qiagen TissueLyser II (at 30 Hz for 2-3 min) to a fine powder. Then 100 mg powder was taken for incubation (at 70°C for 30 min) in 5.0 mL ethanol (80%) having 4.0 mm HEPES buffer (pH 7.5). This was then cooled on ice and centrifuged (1000 rpm for 15.0 min), supernatant was discarded, residue was washed (5.0 mL 70% ethanol) and then suspended in chloroform: methanol (1:1) solution (5.0 mL) for 3.0 min at room temperature and centrifuged (14000 rpm for 15.0 min). The residue was again washed with acetone (5.0 mL), pellet was dried in a desiccator and used as an AIR sample for further analysis (Pawar et al., 2017).

Estimation of cellulose by Updegraff method
To the AIR sample (2.0 mg), Updegraff reagent (acetic acid: nitric acid: water = 8:1:2 v/v) was added and incubated (at 100°C for 30 min). The mixture was then centrifuged (10,000 rpm; 15 min), and pellet was washed four times with acetone and dried overnight. The dried residue was hydrolyzed in 72% H 2 SO 4 , glucose was analyzed by anthrone assay (Updegraff, 1969) and a standard curve was used to estimate the cellulose content.

Estimation of xylose and O-acetyl content
AIR sample (2.0 g) was incubated for neutralization with HCl and NaOH for xylose and acetyl content estimation, respectively. The xylose and O-acetyl content were analyzed using Megazyme K-ACET and K-XYLOSE kits, respectively (Rastogi et al., 2022).

Acetyl bromide soluble lignin content
The 25% acetyl bromide solution was diluted using acetic acid and incubated at 50°C for 2.0h. The solubilized powder was mixed with NaOH and hydroxylamine hydrochloride and then absorbance was recorded at 280 nm and lignin content was measured (Foster et al., 2010).  The 10 extreme RIL genotypes for seed size which was used for the formation of two extreme bulks for the BSA. Where, upper panel represents 10 RILs with maximum seed size (in descending order of their seed weight) and lower panel represents 10 RILs with minimum seed size (in ascending order of their seed weight).

Lignin and cellulose estimation through fourier transform-infrared spectroscopy
Lignin and cellulose contents were estimated using FTIR spectroscopy in the lentil seed powder (Pawar et al., 2017). A Tensor FTIR spectrometer (Bruker Optics) equipped with a singlereflectance horizontal ATR cell (ZnSe Optical Crystal, Bruker Optics) was used for the analysis. The spectrum range selected was from 600 cm -1 to 4000 cm -1 having a resolution of 4 cm -1 . KBr powder was used for the preparation of standard and each sample was measured twice (by removing and adding different aliquots of powder for heterogeneity evaluation) and each spectrum was the average of 16 scans (Labbe et al., 2005;Canteri et al., 2019).

Statistical analysis
ANOVA was performed to determine the genotypic variance among parents and the 188 RIL genotypes (for various seed parameters like seed weight, area, length, width, width/length, compactness, width/area, volume, and perimeter) and also among the parents and the 10 extreme RIL genotypes (for biochemical parameters like lignin, cellulose, and xylose contents) using DSAASTAT ver.1.514 software. Afterward, multiple comparison test was performed using Tukey HSD method (p ≤ 0.05).

Results
Identification of linked marker(s) with seed size in RIL population using BSA For parental polymorphism, 394 SSR primer pairs of different series like PBALC (Kaur et al., 2011), PLC (Jain et al., 2013), LC (Verma et al., 2014), and GLLC (Saha et al., 2010) have been used, and 31 were found polymorphic (Tables 1, S1; Figure S2). The bulked segregant analysis (BSA) was performed on the parents and the two bulks made by mixing equal quantity of DNA from the 10 extreme RIL genotypes for seed size using polymorphic markers ( Figure 3). Of 31 polymorphic SSRs, only one PBLAC449, clearly differentiated the small seed size bulk and the parent, whereas large seed size bulk showed two bands. However, other polymorphic markers could not differentiate the bulk (Figure 4). The PBLAC449 primer amplified 131bp band for L830 and 149bp band for L4602 genotype. To understand this unique type of banding pattern, the individual plants constituting the bulk was amplified. As observed for the bulked samples, all the 10 plants of small seed size samples showed a band similar to the small seed parent i.e. L830 (131 bp). However, the individual plants constituting the large seed size bulk, a mix of amplification patterns with 03 recombinants (having L830 band size) and 02 heterozygotes were recorded ( Figure 5).
Afterward, DNA of RILs were rearranged in the order of increasing seed weight, and then PCR (and gel electrophoresis) was done using PBALC449 primer. Of 188 RILs, 39 lines showed 149 bp amplification (as L4602 type), 43 lines showed heterozygous (both 149 and 131 bp bands), and 106 lines showed 131 bp amplification (as L830 type) ( Figure S3, Table  S2). Based on the seed size, the RILs were broadly grouped into two categories (i) >24.0 g/1000 seeds (large seeded; 95 Numbers) and (ii) <24.0g/1000 seeds (small seeded; 93 Numbers), expecting that the major locus must have been fixed in a 1:1 ratio in the RILs. Of 188 RILs, the first 73 RILs (15.0 to 21.4 g/1000 seed) which were arranged in the increasing order of seed weight, showed only 03 recombinants (and 07 heterozygotes), while the first 93 lines showed only six recombinants (and 13 heterozygotes). This kind of unique banding pattern has clearly suggested the presence of very tight linkage between small seed size trait and PBLAC449 marker and also indicated that there is no marker distortion in the studied population (Table 2).
Interestingly, this marker showed independent segregation for the large-seeded trait. Thus, it seems that the large seed size expression is being governed by two or more major loci. Since the banding pattern was so unique that we were unable to use any standard marker linkage analysis. To decipher such a unique type of banding pattern, we decided to find the chromosomal location of the amplified product (tightly linked with the small seed size trait only) by cloning, sequencing, and the comparative genomics approach.
Cloning, sequencing and chromosomal location of PCR amplified products from PBALC449 marker The pJET1.2 vector was used for cloning of the PCR amplified products (149bp from L4602 and 131bp from L830) from a putatively linked marker viz., PBALC449 for small seed size trait in lentil. The cloned fragment was then sequenced which was further aligned to the reference genome (CDC Redberry Genome Assembly v2.0) using NCBI BLAST browser. The difference in the total length of the amplified product between both parents was due to the presence of 18bp deletion at two places (Table S3). The alignment details of the amplified product with the reference genome is presented in Figures  S4-S5. The position of PBALC449 amplification was at Luc.2RBY.Chr3:398437705.398441563 (+strand) which is a PsbP domain protein-encoding gene (3859 bp) and is present on chromosome number 3 of lentil (Figures 6, S6). The related species  ) sequence similarity showed maximum similarity with Medicago truncatula and was followed by Cicer arietinum ( Figure S7). To identify the candidate genes regulating small seed size trait near this marker, we checked RNA Seq data generated by us using the same parental combinations (Dutta et al., 2022) and the physical chromosomal details available at CDC Redberry Genome Assembly v2.0 (Ramsay et al., 2021). Using KnowPulse browser, on the left side (0.6 Mb region) of the PBALC449 amplified region, three candidate genes namely, E3 Ubiquitin ligase (log2FC -1.582), TIFY-like protein, and hexosyltransferase gene (log2FC -2.474); while on the right side (in 0.7 Mb region), a ubiquitin carboxyl-terminal hydrolase gene was found (Ramsay et al., 2021).
Estimation of lignin, cellulose, xylose, and acetyl content in the parents and the 10 extreme RILs differing for the seed size Cell wall composition is known to determine the size and shape of some seeds. To validate this, we analyzed and compared the cell wall composition in the mature seeds of parents (L4602, and L830), 10 extreme large-seeded RILs (No. 39,86,87,97,102,107,108,115,133,190) and 10 extreme small-seeded RILs (No. 5,14,16,64,88,117,155,160,168,169) (Table 3). Nearly similar FT-IR cellulose content was recorded in the parental genotypes viz., L4602 (24.07%) and L830 (25.96%), while in large-seeded RILs, FT-IR cellulose content was recorded relatively less (21.25 to 28.63%) than that of the smallseeded RILs (22.42 to 39.16%). Lignin is a phenolic polymer that gives rigidity to cell wall, and FT-IR lignin content was recorded more in the small-seeded parental genotype L830 (12.80%) than the largeseeded parental genotype L4602 (11.16%). Similar observations were also recorded for the small-seeded RILs which showed relatively more lignin content (10.73 to 26.85%) than the large-seeded RILs (11.16 to 15.4%). Acetyl bromide soluble lignin (ABSL) content was recorded more in the small-seeded parental genotype L830 (25.07%) than the large-seeded parental genotype L4602 (21.98%). Similarly, smallseeded RILs showed more ABSL content (1.256 to 4.546%) than the large-seeded RILs (1.082 to 2.07%).
The xylan was recorded less than the cellulose or lignin in the seeds of parental genotypes and was found more in the small-seeded genotype L830 (6.86 mg/g) than the large-seeded genotype L4602 (4.16 mg/g). Similarly, in large-seeded RILs, xylan content ranged from 2.219 to 7.152 mg/g while in small-seeded RILs, it varied from 3.08 to 12.18 mg/g. Acetyl content was recorded more in small seeded genotype L830 (4.02 mg/g) than that of the large seeded genotype L4602 (2.013 mg/g). Similarly, in large-seeded RILs, acetyl content ranged from 2.013 to 6.138 mg/g; while in small-seeded RILs, it varied from 4.02 to 10.232 mg/g (Table 3). In general, cellulose was recorded as the most abundant cell wall component in lentil seeds. Overall, a higher value for almost all the studied cell wall components was recorded for the small-seeded genotype L830 over the large-seeded genotype L4602. Gel picture showing BSA for seed size with PBALC449 marker. Where, P1: L4602, P2: L830, B1: large seeded bulk, B2: small-seeded bulk, M: DNA ladder.

Characterization of parental genotypes and the RILs using VideometerLab 4.0 for various seed parameters
The lentil parental genotypes L4602 (42.13 g/1000 seeds), L830 (20.90 g/1000 seeds) and the 10 extreme RILs (large seeded RILs: 34.7-39.2 g/1000 seed, and small-seeded RILs: 16.16-20.1 g/1000 seeds) differed significantly for the mean 1000 seed weight, were used for the study. Various other seed parameters like area, length, width, width/length, compactness, width/area, volume, and perimeter were also measured using VideometerLab 4.0 instrument, which showed significant variations for the studied genotypes (Table 4). Image of the lentil genotypes (L830 and L4602) as captured by VideometerLab 4.0 at 19 different wavelengths (375 to 970 nm) for further seed parameter analysis is given in Figure S8. Interestingly, the mean seed area (mm 2 ), length (mm), width (mm), and perimeter (mm) of the parental genotypes L4602 (22.59, 5.57, 5.24, 15.47 respectively) and L830 (11.02, 3.82, 3.71, 10.66 respectively) were found significantly different between these genotypes (Table 4). In addition, the 10 RILs (each extreme for seed size) were also compared through oneway ANOVA and were grouped using the Tukey HSD method (p ≤ 0.05). For large seeded RILs, the studied seed parameters (like area: 17.36-20.82 mm 2 , length: 4.9-5.3 mm, width: 4.56-5.05 mm, perimeter: 13.47-14.75 mm) were found significantly higher than the small seeded RILs (area: 8.88-11.03 mm 2 , length: 3.51-3.83 mm, width: 3.28-3.71 mm, perimeter: 9.78-10.7 mm). ANOVA was also performed for all the 188 RILs (including parents) and details are presented in Table S4. A representative image ( Figure S7) shows the details of four large and four small-seeded lentil RIL genotypes as captured by VideometerLab 4.0 at two wavelengths (590 and 850 nm). Putative genes identified regulating seed size on both sides (1.2 Mb region) of the PBALC449 in the lentil genome. (Derived from Mortimer et al., 2010;Stoppel et al., 2012;Li and Li, 2014;Ge et al., 2016;Wang et al., 2018). Values represent mean ± SD at P ≤ 0.05. Same lower-case letters within a column are not significantly different. The values in bold represent the higher and lower values.Validation of identified candidate genes in a RIL mapping population Another mapping population (RIL; F 3:4 ) which was derived from the cross between Globe mutant (1000 seed weight=13.6g) and L4775 (1000 seed weight=28.47g) was used for the validation. Two contrasting bulks using 20 extreme plants each for the seed weight (small seed bulk: 1000 seed weight=18.57g; bold seed bulk: 1000 seed weight=24.46g) along with a parent (Globe Mutant) was used for the whole genome resequencing (WGRS). Detailed sequence analysis could identify 90 SNPs/InDels for the four candidate genes as identified by the BLAC449 marker (Table S5).
For E3 ubiqutin ligase gene 03 SNPs was identified; whereas for TIFY-like protein gene, 34 SNPs and 01 InDel was identified and most of these showed modifier effect. Among the 20 SNPs and 02 InDels of Hexosyltransferase gene, one InDel showed disruptive inframe deletion with moderate effect while other SNPs showed mostly missense variant with moderate effect. Similarly, for the Ubiquitin carboxyl-terminal hydrolase gene we have identified 30 SNPs and most of these showed modifier effect (Table S5).

BSA and identification of candidate genes regulating seed size trait in lentil
A total of 394 SSR diverse SSR primer pairs (Saha et al., 2010;Kaur et al., 2011;Jain et al., 2013;Verma et al., 2014) were used and 31 were found polymorphic, which is 7.9% of the total primers used. A similar level of polymorphism was also reported by previous workers Singh et al., 2019). Of all the polymorphic SSRs, Values represent mean ± SD at P ≤ 0.05 and the same lower-case letters within a column are not significantly different. The values in bold represent the higher and lower values. only one (PBLAC449) could differentiate the small seed size bulk and the parent, while the large seed size bulk showed two bands. This kind of unique polymorphism pattern was not yet reported in the lentil. Detailed RIL analysis (188 No) using PBLAC449 marker showed that the region near the PBLAC449 marker, seems to regulate the small seed size trail while large seed size is being governed by more than one locus. Moreover, quantitative regulation of seed size trait is reported by a number of workers (Fedoruk et al., 2013;Verma et al., 2015;Khazaei et al., 2018).
To understand this unique type of banding pattern, and to find the chromosomal location of amplified product (tightly linked with the small seed size trait only); cloning, sequencing, and the comparative genomics approaches were used. The PCR amplified fragment was cloned, sequenced, and aligned to the recently released lentil reference genome (CDC Redberry Genome Assembly v2.0) (Ramsay et al., 2021). The PCR amplified product from PBALC449 got aligned at Luc.2RBY.Chr3:398437705.398441563 (+strand) on chromosome number 3 and is a PsbP domain protein-encoding gene (3859 bp) ( Figure S5). Similarly, Verma et al. (2015) have identified three major QTLs for seed weight and seed size traits in lentils on LG4; while, Fedoruk et al. (2013) have identified three QTLs for seed diameter on LG1, 2, and 7 which together explained >60% of the PVE and Khazaei et al. (2018) have identified two associated SNPs with seed diameter (viz. LcC09638p190 and LcC08798p992) on chromosomes 1 and 7, respectively. In addition, QTLs for seed weight (Abbo et al., 1991) and seed diameter (Fratini et al., 2007) are identified in lentils.
Further, to identify the candidate genes regulating small seed size trait near this marker (1.4 Mb region), we analyzed our RNA Seq data (Dutta et al., 2022). On the left side of the PBALC449 amplified region (0.6 Mb), three genes namely, E3 ubiquitin ligase (log2FC -1.582), hexosyltransferase (log2FC -2.474), and TIFY-like protein gene were found, while on the right side (0.7 Mb) a ubiquitin carboxyl-terminal hydrolase gene was found ( Figure 6). The E3 Ubiquitin ligase gene is known to have a role in controlling cell division (Li and Li, 2014); while the TIFY-like protein gene is having a role in regulating the process of plant development (Ge et al., 2016). Similarly, the hexosyltransferase gene is known to have a role in the regulation of xylan synthesis (Mortimer et al., 2010); while ubiquitin carboxylterminal hydrolase gene is required for periodic maintenance of the circadian clock (Hayama et al., 2019) and inflorescence architecture (Yang et al., 2007) in Arabidopsis. Domoney et al. (2006) reported two distinct phases during seed development in the legumes. In the first phase, cell division (in seeds) is dependent on embryo genotype having certain loci controlling the cotyledon cell number and is largely insensitive to environmental cues. Thus, this phase mainly controls the seed diameter and seed plumpness. The second phase regulates seed thickness via cell expansion, which is highly influenced by the environment and is mainly regulated by photosynthate partitioning loci. The seed size is reportedly influenced by both pre-anthesis and post-anthesis periods (Gupta et al., 2006) as these affect the amount of assimilates partitioned to the developing seeds (Pre-anthesis) and also the time for seed maturation (post-anthesis) which could alter the seed size. Flowering time and other flower morphology-related loci were also known to control the seed size in model legume crops (Ohto et al., 2005;He et al., 2010;Wang et al., 2012). In chickpea, a major flowering time gene, PPD, is reported to affect the seed weight, and early flowering results in reduced seed weight (Hovav et al., 2003). Validation results in another mapping population (Globe mutant × L4775) using WGRS also confirmed the presence of SNPs and InDels in the four candidate genes. However, there is still a need to validate these candidate genes having a role in the seed size regulation, in different lentil genotypes for its ultimate application in the breeding program aiming for seed size improvement.

Seed biochemical parameters
Seed size and shape are regulated by the cell wall composition in lentils (Dutta et al., 2022). However, no other detailed report mentioning the relationship between the seed size and cell wall composition including cellulose, lignin, and xylose in lentils is known. The data of parents and the 10 extreme RILs for the cell wall composition in the mature seeds showed significant variations for parameters like FT-IR cellulose, FT-IR lignin, ABSL, xylan, and acetyl content (Table 3). In small seeded RILs, in general, more of FT-IR cellulose (22.4 to 39.16%), FT-IR lignin (10.73 to 26.85%), ABSL (1.26 to 4.55%), xylose content (3.083 to 12.18 mg/g), acetyl content (4.02 to 10.23 mg/g) was recorded than the large-seeded RILs (FT-IR cellulose: 21.25 to 28.6%; FT-IR lignin: 11.16 to 15.4%; ABSL: 1.08 to 2.19%; xylose content: 2.22 to 7.15 mg/g; acetyl content: 2.01 to 6.14 mg/g). Overall, cellulose was recorded as the most abundant cell wall component in lentil seeds. Similarly, cellulose and hemicellulose such as galactomannan, mannan, and xyloglucan were found to play a crucial function in determining the shape and size of both developing and mature seeds (Buckeridge, 2010).
This study recorded up to 39.16% cellulose (FT-IR) in lentil seeds, whereas in different plant species nearly 40-60% cellulose was reported (Costa and Plazanet, 2016). In the RILs, 10.73 to 26.85% lignin (FT-IR) was recorded whereas 5.13% mean lignin content was recorded in soybean seeds (Krzyzanowski et al., 2008), and genotypes having >5% lignin in the seed coat were less prone to mechanical damage (Alvarez et al., 1997). The presence of more lignin in lentil seeds over soybean may be due to the presence of more colored compounds in the lentil seed coat (Dutta et al., 2022). Xylose and xyloglucan are considered important seed storage polysaccharides, especially in developing seeds (Buckeridge, 2010). The studied lentil genotypes showed 2.22 to 12.18 mg/g xylose content, whereas 3.5-4.5% (dry weight basis) acetyl-xylose content was recorded in the hardwoods (Teleman et al., 2002). Acetyl content in the range of 2.01 to 10.23 mg/g was recorded in the studied RILs. Differential seed sizes in different genotypes might be due to the different levels of polysaccharides acetylation which seems to affect their water solubility, interactions with cellulose, and various other physicochemical properties (Busse-Wicher et al., 2014).
RNA-seq results of Dutta et al. (2022) identified various cell wallassociated GO terms and also the differential expression of xyloglucan endotransglucosylase encoding gene, suggesting their involvement in the cell wall synthesis during seed development in lentils, and similar results were also recorded in soybean (Du et al., 2017). Overall, a higher value for almost all the studied cell wall components for smallseeded lentil genotype (L830) over large-seeded (L4602) genotype needs further detailed stage-specific investigations.
Characterization of lentil genotypes using VideometerLab 4.0 for various seed parameters In general, seed size in lentils is mostly determined using a very crude method of measuring 100 or 1000 seed weight (Tullu et al., 2001). Even in soybean, seed shape parameters are measured using a caliper (Xu et al., 2011), while in chickpeas Hossain et al. (2010) used seed sizing using graded sieves for determining the seed size and shape. By this, it is impossible to determine the seed thickness or seed plumpness (Dutta et al., 2022). However, in this study, VideometerLab 4.0 instrument was used to measure various seed parameters like area, length, width, width/length, compactness, width/area, volume, and perimeter of all the RILs (188 No) and its parents. Most of the studied parameters showed significant variations for the studied genotypes (Table S4). For large seeded RILs and parents, 1000 seed weight (34.7 to 42.13 g), area (17.36 to 22.59 mm 2 ), seed length (4.9 to 5.57 mm), seed width (4.56 to 5.24 mm), and seed perimeter (13.47 to 15.47 mm) were found significantly more than the small seeded RILs including parent (1000 seed weight: 16.16 to 20.90 g, area: 8.88 to 11.03 mm 2 , seed length: 3.51 to 3.83 mm, seed width: 3.28 to 3.71 mm, and seed perimeter: 9.78 to 10.7 mm). Similarly, Shahin et al. (2012) used cameras and captured the 3-dimensional lentil seed images and measured the seed plumpness; while Shahin and Symons (2001) deployed computer-aided two-dimensional imaging to measure the diameter of lentil seeds. Similarly, previous studies also demonstrated huge variations for various seed parameters in lentils (Tullu et al., 2001;Tripathi et al., 2022). Thus, the use of VideometerLab was found very precise, quick, and easy method for the determination of several seed parameters.

Conclusions
Results of the study have conclusively shown the importance of the maker PBLAC449 in the identification of genotypes having small seed size in lentils. In addition, the region identified on chromosome 03, needs more critical attention for the validation of genes regulating the seed size trait in lentils. The cell wall composition including cellulose, xylan, etc. was extensively analyzed, using wet chemistry methods and FT-IR to understand the association between cell extensibility and the seed size in lentils. Compared to any other method, the use of VideometerLab 4.0 was found very effective, easy, and quick, and should be used for the measurement of various essential seed parameters in lentils. Thus, the information generated in this study has paved the way for further in-depth analysis of the factors governing seed size in lentils including the development of genotypes having customized seed sizes.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

Author contributions
Conceptualization: GM, HKD, SG, and SK; methodology: HD, SM, SS, MA, MT, SD, PP, AK, KT, and RK; formal analysis: HD, SM, DV, DM, and AS; resources: GM, SK, and HKD; data curation: HKD and GM; writing-original draft preparation: HD, GM, and HKD; writing-review and editing: HKD, GM, SG, and SK; supervision: GM and HKD. All authors contributed to the article and approved the submitted version.

Funding
The work was supported and funded by the Indian Council of Agricultural Research (ICAR) and the International Center for Agricultural Research in the Dry Areas (ICARDA).

SUPPLEMENTARY FIGURE 4
BLAST for sequence from small seeded parent (L830) (131 bp; due to 18 bp deletion at two places).

SUPPLEMENTARY FIGURE 5
Identified position of the SSR (PBALC 449) in the lentil genome (Chromosome 3).

SUPPLEMENTARY FIGURE 6
Sequnce similarity of the marker (PBALC449) with the relative sp. It is showing most similarity to Medicago truncatula, followed by Cicer arietinum.

SUPPLEMENTARY FIGURE 7
Representative image of eight lentil RIL genotypes, captured by VideometerLab 4.0 at two wavelengths (590 and 850 nm) for further seed parameter analysis.

SUPPLEMENTARY FIGURE 8
Image of the lentil genotypes (L830 and L4602) captured by VideometerLab 4.0 at 19 wavelengths (375 to 970 nm) for further seed parameter analysis.